Method for decoding compressed video data with a reduced memory requirement

ABSTRACT

For the purpose of bidirectional movement-compensated interpolation, it is proposed, when decoding compressed video sequences, completely to decode only one of the two reference frames. Only those respective picture areas of the other reference frame are decompressed which overlap with the picture area to be determined of the interpolated frame, taking account of a displacement corresponding to the movement compensation. As a result, a considerable reduction in the memory requirement of the decoding hardware is possible.

BACKGROUND OF THE INVENTION

The present standards for compressing video data, for example ISO/IEC11172-2 (MPEG1) or ISO/IEC 13818-2 (MPEG2), combine the principles ofprediction and transformation coding to produce so-called hybrid coding.The prediction is carried out here with the aid of a so-calleddifferential pulse code modulation loop (DPCM loop), which generatesdifferential frames by subtracting predicted video data from theoriginal video data to be coded. The local correlations remaining inthese differential frames between adjacent pixels are utilized with theaid of suitable transformation, preferably with the is aid of discretecosine transformation (DCT). The transformation coefficients produced inthis case are subsequently subjected to quantization and transmittedafter entropy coding. The fundamental mode of operation of thiscompression method is known to the person skilled in the art fromvarious publications, for example from D. J. Le Gall, “The MPEG videocompression algorithm”, Signal Processing: Image Communication 4 (1992)129-140. International Standard ISO/IEC 11172-2: Coding of movingpictures and associated audio, ISO/MPEG, 1993(E). Draft InternationalStandard ISO/IEC 13818-2: Generic coding of moving pictures andassociated audio, Mar. 25, 1994.

In order to improve the quality of the prediction in those picture areasin which moving objects occur, use is made of so-calledmovement-compensated prediction. The principles of the movementestimation required for this purpose and their application for themovement-compensated prediction are known to the person skilled in theart, for example from M. Bierling, “Displacement estimation byhierarchical block matching”, 3rd SPIE Symp. on Visual Commun.,Cambridge, Mass., November 1988. Draft International Standard ISO/IEC13818-2: Generic coding of moving pictures and associated audio, Mar.25, 1994. So-called movement-compensated interpolation is provided inaddition to the movement-compensated prediction in the case of the saidstandardized methods for compressing video data. In connection with MPEGterminology, movement-compensated interpolation is also designated asbidirectional prediction. However, since this designation may easily beconfused with movement-compensated prediction, the termmovement-compensated interpolation is preferred in the context of thisapplication. The picture quality is decisively improved by the use ofmovement-compensated interpolation, since it allows a satisfactorytreatment of masked picture areas and contributes to improved noisesuppression.

A distinction is made between three differently coded types of frames.So-called I-frames are transmitted without any chronological prediction,but rather are subjected only to intra-frame coding, preferably DCTcoding with subsequent quantization of the transformation coefficients.In the context of this patent application, “intra-frame coding” is to beunderstood quite generally as any method which is suitable for treatinglocal correlations in video data. The so-called P-frames are predictedwith the aid of the DPCM loop from chronologically preceding I-frames orP-frames (forward prediction). The difference between the predictedframe and the actual frame is subjected to inter-frame coding,preferably DCT transformation with subsequent quantization of thetransformation coefficients. So-called B-frames, which are alsodesignated as “interpolated frames” in the context of the present patentapplication, are chronologically situated between an I-frame and aP-frame or between two P-frames. B-frames are determined by means of“bidirectional” movement-compensated interpolation from achronologically preceding I- or P-frame and a chronologically succeedingI- or P-frame. In this case, the expressions “chronologically”,“succeeding” and “preceding” do not refer to the order in which theseframes are transmitted in the data stream of compressed frames, ratherthey refer to the order in which these frames are recorded/reproduced.In the same way as P-frames, B-frames, too, are coded in the form ofquantized transformation coefficients of a differential frame.

In the case of currently known implementations, the reconstruction of aB-frame by means of movement-compensated interpolation from achronologically preceding I- or P-frame and a chronologically succeedingI- or P-frame necessitates the provision of the two reference frames(which are also occasionally designated as support frames in theliterature) in fully decoded form. Therefore, two fully decodedreference frames (i- or P-frames) have to be stored in a frame store inthe case of the methods belonging to the prior art for carrying outmovement-compensated interpolation. The re-interlacing during the videooutput requires further storage capacity. The overall required memory isa decisive cost factor in the hardware used for decoding and encoding. Areduction in the storage capacity required is therefore desirable.

SUMMARY OF THE INVENTION

The invention is based on the object of specifying a method for decodingcompressed video data with a reduced memory requirement. This object isachieved according to the invention by a method for decoding compressedvideo data with a reduced memory requirement In this case, only memoryspace taken up by one fully decoded first reference frame and onecompressed second reference frame is required for storing the referenceframe data when carrying out interpolation. A reduction in the memoryrequirement by, for example, at least 2.6 Mbits can be expected in atypical application (MPEG2/Main Level/Main Profile) with an assumedtypical channel data rate of 10 Mbit/s. This reduction in the necessarystorage capacity means that integration of the codec on a chip issubstantially facilitated or even becomes possible in the first place,and the implementation becomes decisively more economical thereby.However, a prerequisite for this is an adequately fast hardware logicunit for the decoding/encoding or, if this is not the case, multiplereplication of the corresponding hardware logic circuits.

The core of the invention is not to store the uncompressed data for thesecond reference frame but rather to keep in store only the compresseddata in the (correspondingly enlarged) input buffer of the decoder. Forthe purpose of reconstructing a B-frame between two P- or I-frames(generally between two reference frames), only those picture areas whichare instantaneously required for the video output are decoded from thecompressed data stream. In the case of MPEG, these are, for example,those blocks of the second reference frame which overlap with themacroblock to be determined of the interpolated frame, taking intoaccount a displacement corresponding to the movement compensation. Inthe example of frame prediction, 9 blocks each of 8 times 8 luminancepixels of the first reference frame are required for the luminancepixels of a macroblock to be decoded of the second reference frame.

The area of application of the invention includes not only purelydecoding but also encoding video data, because an encoder which operatesin accordance with the principle of prediction by DPCM coding alwayscontains a decoder, which decodes the coded data again, as a result ofwhich it becomes possible in the first place to calculate differentialframes in a DPC loop D. J. Le Gall, “The MPEG video compressionalgorithm”, Signal Processing: Image Communication 4 (1992) 129-140.

An advantageous embodiment of the invention is one in which, for thepurpose of reconstructing and outputting the second reference frame,after the interpolated frames are output, the memory space provided forthe output of these frames is initially used to store the secondreference frame in an uncompressed form.

It is further advantageous if

after this the store for the first reference frame is used to store theremainder of the second reference frame in an uncompressed form,

the remainder of the store for the first reference frame which is nolonger required is used as an output memory for the succeedinginterpolated frame, and if

the memory area which has now accommodated the completely uncompressedsecond reference frame contains the first reference frame for thedecoding of the chronologically succeeding interpolated frames.

The invention can also be implemented with the aid of a circuitarrangement for decoding compressed video data, for example having thefeatures according to one of the claims.

The invention is in no way restricted to the area of transformationcoding, and is in no way whatsoever restricted to the area ofblock-by-block DCT coding. Since no preconditions have to be madeconcerning the type of intra-frame coding, the invention can be appliedin connection with virtually all known methods, or methods to bedeveloped in future, of intra-frame coding, for example also inconnection with so-called quadtree coding or in connection with methodsbased on object segments. The decisive prerequisite for theapplicability of the invention is the “bidirectional”movement-compensated interpolation.

BRIEF DESCRIPTION OF THE DRAWINGS

The features of the present invention which are believed to be novel,are set forth with particularity in the appended claims. The invention,together.with further objects and advantages, may best be understood byreference to the following description taken in conjunction with theaccompanying drawings, in the several Figures of which like referencenumerals identify like elements, and in which:

The invention will be explained in more detail below using preferredexemplary embodiments and with the aid of the figures.

FIG. 1 diagrammatically shows a block diagram of an MPEG2 video decoder,for example for digital television.

FIG. 2 diagrammatically shows a block diagram of an MPEG2 decoder with areduced frame store requirement in accordance with the presentinvention.

FIG. 3 diagrammatically shows the relationship between a macroblock of aB-frame and the corresponding macroblocks, displaced by movementcompensation, of the associated reference frames.

FIG. 4 diagrammatically shows the division of the frame store inaccordance with the method according to the invention.

In these figures, the reference symbols used represent the followingterms:

CDBS Bit stream of the compressed video data

BF Buffer

INVQ Inverse quantization

DCTC Transformation coefficients, DCT coefficients

IDCT Inverse DCT

MD Movement information, movement vectors

BSP Frame store

RBv Reference blocks for forward prediction

RBr Reference blocks for backward prediction

BSYN Picture synthesis

DCMB Decoded macroblock

AS Video output memory (for re-interlacing)

YUVRGB YUV/RGB conversion

MON Monitor

HA Header evaluation

VRLD Variable length/run length decoding

MC Movement compensation

ZGT Access table to a memory area

RB1, 2 Reference frames (support frames)

IB Interpolated frame

RF1 Reference frame 1

RF11 First half of the reference frame 1

RF12 Second half of the referenceframe 1

DESCRIPTION OF THE PREFERRED EMBODIMENTS

A The coding of frame sequences for the purpose of data compressiontakes place in accordance with a hybrid-DCT method, that is to say theutilization of the chronological correlation of the frames takes placeby means of chronological movement-compensated prediction, while theutilization of the local correlation is given by the discrete cosinetransformation (DCT). This DCT is carried out on blocks of 8 times 8pixels: in each case 4 blocks of luminance Y, 1 block of chrominance Uand 1 block of chrominance V are combined to form a macroblock, with theresult that such a macroblock covers 16 times 16 pixels in a colorpicture. Macroblocks of a block line can, but do not have to, becombined to form so-called slices [2], [3]. Common items of informationfor these macroblocks are contained in the slice header.

A series of video frames is preferably split into groups of pictures(GOP), which may include, for example, 12 frames. Each group of picturesbegins with an intra-frame-coded picture—also called I-frame. Only thelocal correlation is used for the data compression of these frames;there is no recourse to previous frames and also no anticipation offuture frames. This I-frame is used as a reference frame for a future,chronologically predicted P-frame. This prediction is carried out withthe aid of the movement compensation. The chronological prediction erroris transmitted as an intra-frame-coded differential frame. A P-frame isused, in turn, as a reference frame for the prediction of the nextP-frame.

As shown in FIG. 1, for the purpose of decoding a bit stream ofcompressed video data, this bit stream CDBS is initially loaded into abuffer BF which, in the case of MPEG2, has a minimum size of, forexample, 1.75 Mbits and from which the data can be read at a variablerate controlled by the decoder. The frames are present in this inputdata stream, for example, in the order

I1, P4, B2, B3, P7, B5, B6, P10, B8, B9, I13, B11, B12, P16, B14, . . .

the letter representing the type of frame and the associated numberrepresenting the position in the decoded bit stream, that is to sayspecifying the order in which the frames have to be showed.

That is to say the B-frames B2 and B3 (inter-polated frames), whichbelong to the instants 2 and 3, respectively, and are to be decoded bymeans of inter-polation from the reference frames I1 and P4,chronologically follow, during transmission, the reference frames whichbelong to the recording instants 1 and 4. When reference is made in thecontext of this patent application to chronologically preceding orchronologically succeeding frames, what is always meant, however, is theorder of recording/reproduction instants in which, for example, theP-frame P4 follows the B-frames B2 and B3.

As is shown using Table 1 below, two frame stores are adequate fordecoding the coded data stream, if the memory allocation below is used:

In this case, operations for format conversions (interlaced/progressive,block format/line format) in the video output memory were initially nottaken into account.

TABLE 1 Allocation of the frame stores in the prior art ContentsDecoding of frame Contents of Step Input of store A frame store B Outputof 1 I₁ I₁ I₁ is coded written in 2 P₄ P₄ I₁ is read P₄ is written I₁coded out in decoded 3 B₂ B₂ Accesses Accesses to P₄ B₂ coded to I₁decoded 4 B₃ B₁ Accesses Accesses to P₄ B₃ coded to I₁ decoded 5 P₇ P₇P₇ is P₄ is read out P₄ coded written in decoded 6 B₅ B₅ Accesses toAccesses to P₄ B₅ coded P₇ decoded 7 B₆ B₆ Accesses to Accesses to P₄ B₆coded P₇ decoded 8 P₁₀ P₁₀ P₇ is read P₁₀ is written P₇ coded out indecoded 9 B₈ B₈ Accesses to Accesses to P₁₀ B₈ coded P₇ decoded 10  B₉B₉ Accesses to Accesses to P₁₀ B₉ coded P₇ decoded 11  I₁₃ I₁₃ I₁₃ isP₁₀ is read out P₁₀ coded written in decoded 12  B₁₁ B₁₁ Accesses toAccesses to P₁₀ B₁₁ coded I₁₃ decoded 13  B₁₂ B₁₂ Accesses to Accessesto P₁₀ B₁₂ coded I₁₃ decoded 14  P₁₆ P₁₆ I₁₃ is P₁₆ is read out I₁₃coded written in decoded 15  B₁₄ B₁₄ Accesses to Accesses to P₁₆ B₁₄coded I₁₃ decoded

The invention will now show a way of further reducing the memoryrequirement, with the result that fewer than two complete frame storesare required for carrying out the bidirectional interpolation. In thecase of the methods according to the prior art, the data streamcontained in the buffer BF is—as shown in FIG. 1—evaluated by a deviceHA for header evaluation to the extent that individual macroblocks ofall the coded frames can subsequently be subjected to intra-framedecoding, preferably in the form of inverse quantization INVQ andinverse transformation IDCT. In accordance with the present invention, arespective reference frame is kept—as shown in FIG. 2—in compressed formin a frame store BSP2, which is written to and read from via an accesstable ZGT. From these descriptions, it is clear to the person skilled inthe art that the store BSP1 does not necessarily have to be physicallydifferent from the memory areas of the buffer BF, but that the storeBSP2 may also be merely a logically different form of access to the dataof the buffer BF. However, the respective other reference frame is used,after having been completely decoded, for the picture synthesis BSYN1,2and is stored in uncompressed form in a complete frame store BSP1.

A P-frame Pi is always predicted unidirectionally (forwards) from thepreceding I- or P-frame (I(i−b−1) or P(i−b−1)), b designating the numberof B-frames between two reference frames. It is therefore sufficient tostore the first reference frame in an uncompressed form if sufficientcomputing power is available to calculate in real time the areas,required for interpolation, of the missing P-frame Pi. This P-frame Pican then remain compressed until it has to be output itself and servesas a basis for the following P-frame P(i+b+1) (see FIG. 2). An I-frameI(i) can also remain compressed until it has to be output itself andserves as a basis for the following P-frame.

The movement information for the prediction is specified for individualmacroblocks. A macroblock consists of 16 times 16 luminance pixels plus2 times 8 times 8 chrominance pixels in the so-called 4:2:0 format. Ifonly one macroblock in the B-frame contains a movement vector withregard to the P-frame Pi, then the associated macroblocks of the P-framemust be decompressed for the movement compensation. The movement vector(for example in the case of frame prediction) will generally refer to a16 times 16 block, which does not have to be situated in the raster ofthe original macroblocks but rather may be displaced in the half-pixelraster (FIG. 3). Therefore, four macroblocks (or at least 9 blocks eachof 8 times 8 luminance pixels plus two times 4 blocks each of 8 times 8chrominance pixels) normally have to be completely decompressed for thebackward prediction.

Therefore, the necessary computing speed is four times higher in thecase of backward prediction compared with conventional MPEG decodingdescribed at the beginning (in the case of frame prediction). Whendecompressing these four macroblocks, access is made to four 16 times 16frame blocks (or in the worst case, for 16 times 8 prediction in thefield mode and for half-pixel accuracy: eight 17 times 9 picture areas).This access is not problematic for the computing power, since thecorresponding video data are stored in an uncompressed form. Moreover, aframe block of the first reference frame (I(i−b−1) or P(i−b−1)) storedin an uncompressed form must be read for the forward prediction. Thenecessary memory bandwidth is increased thereby to approximately 750Mbit/s, since a 16 times 16 frame block for the forward prediction andfour 16 times 16 frame blocks for the backward prediction normally haveto be read for each macroblock. This bandwidth problem may be alleviatedby integration of the memory on the processor chip.

A prerequisite for the application of the invention is the possibilityof direct access to individual frame blocks of the reference frames.Therefore, the compressed data must be evaluated (HA) with regard to theaddresses of the macroblocks in the memory. These addresses must bestored in the form of a table (ZGT). In addition, any existing mutuallydifferential codings of the macroblocks must be reversed.

As soon as the P-frame Pi has to be output, it is also stored in anuncompressed form. Therefore, it is necessary that enough frame storememory is made available to ensure that no video data, required for thedecompression, of the first reference frame are overwritten too early.Therefore, at least the maximum possible search area to which themovement vectors can be referenced must remain in the store. If themovement vector fields of the P-frame Pi is evaluated in advance, it iseven possible to reduce the necessary memory space even further as aresult of the fact that areas which are no longer required areoverwritten early. In this case, the access table ZGT can, in turn, beused for decompression of the macroblocks in any order. It is emphasizedagain that the memory requirement for the compressed reference frameBSP2 can be regarded, together with the buffer BF which is presentanyway, as a unit in the sense that the compressed data do not have tobe kept additionally in a further physical store BSP2, but that thestore BSP2 may also represent merely a logic form of access to datastored anyway in the buffer BF.

For the purpose of reconstructing a B-frame between two P- or I-frames(generally between two reference frames) only those picture areas whichare instantaneously required for the video output are decoded from thecompressed data stream. Since the video data are produced in ablock-by-block manner, but are reproduced in a line-by-line manner, theoutput memory must be able to accommodate at least two block lines. Inorder, moreover, to be able to output pictures which are coded as framesusing the line interlacing method, that is to say as two fields, a fieldstore must be provided (re-interlacing).

In comparison with the currently known methods, according to the presentinvention both the B-frame data and the data of the following referenceframe must be decoded in the case of every B-frame, therefore either afaster decoding hardware is necessary, or this hardware must be multiplyreplicated for parallel processing. However, the additional chip surfacearea required in the case of duplicating the decoding hardware is,possibly, less than the gain in chip surface area as a result of thereduction in the required storage capacity.

When using the currently known methods for decoding, a total storagecapacity of approximately 16 Mbits is required for the output memory andinput buffer in order to store the reference frames in the CCIR 601format. Using the method according to the present invention, the maximumrequirement is reduced to 4.75 Mbits for the first reference store, 2Mbits for the input buffer including 600 kbits for an additionaloverflow reserve, 1.75 Mbits for the compressed reference frame, 34kbits for the access table and 270 kbits for the storage of twomacroblock lines in the output memory. Therefore, it is possible toreduce the memory requirement to approximately 11.6 Mbits or less withthe aid of the method described here.

The procedure for decoding can be described as follows (in the case ofthe invention being applied to an encoder, the decoder section of theencoder having to be implemented appropriately): the data stream of agroup of pictures begins with header information (GOP header). The datafor an I-frame then follow. The first reference frame is reconstructedfrom these data and these video data are stored in the first referenceframe store (reference store). The data for the P-frame follow—ifpresent. Although these data are evaluated, the second reference frameis not immediately reconstructed if B-frames are present, rather thosepoints at which the header information (picture header, slice header)and the macroblock information are located are merely recorded in anaccess table (ZGT). There must be an entry for every macroblock in theaccess table.

FIG. 2 shows a simplified block diagram of a decoder with a reducedmemory requirement according to the present invention. In this case, itis essential that the data for the second reference frame store(reference store) are kept in a compressed form in the store BSP2, andthat access can be made to these data at the slice and macroblock level(the stores BSP2 and BF may in this case be implemented as a physicalstore). In order to make this possible, the associated addresses must berecorded in an access table. After the analysis of the data for theP-frame, there follow the data for the first B-frame, which is thesecond frame to be displayed within the group of pictures. Theindividual macroblocks of the B-frames are, as already described above,reconstructed from the movement-compensated bidirectional interpolationstarting from adjacent reference frames and the transmitted predictionerror. This means that a block of 16 times 16 pixels which is displacedwith respect to the macroblock to be reconstructed must be available ineach case from the 1st and the 2nd reference frame (see FIG. 3). Whereasthe required video data of the 1st reference frame are already availablein the 1st reference frame store BSP1, the corresponding data for thesecond reference frame store must first be reconstructed. As is evidentin FIG. 3, these are generally four macroblocks, the position of whichin the reference frame results from the position of the macroblock to bereconstructed in the B-frame and of the associated movement vector. Thedata necessary for this reconstruction are taken from the store BSP2,and the associated addresses are located in the access table (ZGT).Accordingly, four macroblocks of the 2nd reference frame normally haveto be reconstructed in each case in order to reconstruct each macroblockin a B-frame. The reconstructed B-frames are output via the outputmemory.

Once all of the B-frames between two reference frames have beenprocessed and reproduced, the entire second reference frame isreconstructed anew, on the one hand for the video output and on theother hand so that it can serve as the first reference frame for thenext cycle (P B B . . . or I B B). The data for this are initiallystored in the memory area which was previously used for the outputmemory. Since this area is too small to accommodate the entire referenceframe (field store), the area which was previously used for the firstreference frame in the preceding cycle is subsequently overwritten. Thememory area which is no longer required after this reconstruction isused as the output memory for the next frames (see FIG. 4).

The memory area which was previously used for the 1st reference framestore must never be overwritten directly, since the data are stillinitially required for the reconstruction of the predicted frame. Owingto the movement compensation, it is still possible, even during thereconstruction of a block line lower down, to have recourse to datalocated at a point which is higher by the maximum value of the verticalcomponent of the movement vector (127 lines Din the case of ISO/IEC13818-2). This means that this area must always be kept ready, even ifthe output memory should be reduced still further by other measures.

The invention is not limited to the particular details of the method andapparatus depicted and other modifications and applications arecontemplated. Certain other changes may be made in the above describedmethod and apparatus without departing from the true spirit and scope ofthe invention herein involved. It is intended, therefore, that thesubject matter in the above depiction shall be interpreted asillustrative and not in a limiting sense.

What is claimed is:
 1. A method for decoding compressed video data witha reduced memory requirement in accordance with the principle ofdifferential pulse code modulation coding with movement-compensatedprediction, comprising the steps of: determining interpolated framesfrom a chronologically preceding reference frame and a chronologicallysucceeding reference frame using bidirectional movement-compensatedinterpolation; completely decompressing only one first reference frameof the chronologically preceding reference frame and the chronologicallysucceeding reference frame in order to reconstruct an interpolatedframe; uncompressing only picture areas of the other second referenceframe of the chronologically preceding reference frame and thechronologically succeeding reference frame which overlap with a picturearea to be determined of the interpolated frame, taking into accountdisplacement corresponding to movement compensation.
 2. The methodaccording to claim 1, wherein, for reconstructing and outputting thesecond reference frame, after the interpolated frames are output, amemory space provided for video output is initially used to store atleast part of the second reference frame in an uncompressed form.
 3. Amethod for decoding compressed video data with a reduced memoryrequirement in accordance with the principle of differential pulse codemodulation coding with movement-compensated prediction, comprising thesteps of: determining interpolated frames from a chronologicallypreceding reference frame and a chronologically succeeding referenceframe using bidirectional movement-compensated interpolation; completelydecompressing only one first reference frame of the chronologicallypreceding reference frame and the chronologically succeeding referenceframe in order to reconstruct an interpolated frame; compressing onlypicture areas of the other second reference frame of the chronologicallypreceding reference frame and the chronologically succeeding referenceframe which overlap with a picture area to be determined of theinterpolated frame, taking into account displacement corresponding tomovement compensation; reconstructing and outputting the secondreference frame, after the interpolated frames are output, a memoryspace provided for video output is initially used to store at least partof the second reference frame in an uncompressed form; initially using amemory space, provided for the video store, to store at least part ofthe second reference frame in an uncompressed form for reconstructingand outputting the second reference frame; using a store, after themessage space is used, for the first reference frame to store aremainder of the second reference frame in an uncompressed form; using aremainder of the store for the first reference frame, which is no longerrequired, as an output memory for succeeding frames; and the memoryarea, which has now accommodated the completely uncompressed secondreference frame, containing the first reference frame for the decodingof the chronologically succeeding interpolated frames.